Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 171
Filtrar
1.
IEEE Trans Signal Process ; 72: 70-83, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38283047

RESUMEN

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

2.
Biometrics ; 79(4): 2987-2997, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37431147

RESUMEN

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP). The transmission rate model is further embedded in a hierarchy to allow information borrowing across parallel streams of regional incidence data. Crucially, the method makes use of optional vaccination data as a first step toward modeling of endemic infectious diseases. Computational techniques borrowed from the Bayesian spatial analysis literature enable fast and reliable posterior computation. Simulation studies reveal that the method recovers true covariate effects at nominal coverage levels. We analyze data from the COVID-19 pandemic and validate forecast intervals on held-out data. User-friendly software is provided to enable practitioners to easily deploy the method in public health research.


Asunto(s)
Enfermedades Transmisibles , Pandemias , Humanos , Modelos Estadísticos , Modelos Epidemiológicos , Teorema de Bayes , Enfermedades Transmisibles/epidemiología , Predicción
3.
Neuroimage ; 276: 120214, 2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37286151

RESUMEN

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs. In this article, we propose a human trait prediction framework utilizing a tractography-based representation of the brain connectome, which clusters fiber endpoints to define a data-driven white matter parcellation targeted to explain variation among individuals and predict human traits. This leads to Principal Parcellation Analysis (PPA), representing individual brain connectomes by compositional vectors building on a basis system of fiber bundles that captures the connectivity at the population level. PPA eliminates the need to choose atlases and ROIs a priori, and provides a simpler, vector-valued representation that facilitates easier statistical analysis compared to the complex graph structures encountered in classical connectome analyses. We illustrate the proposed approach through applications to data from the Human Connectome Project (HCP) and show that PPA connectomes improve power in predicting human traits over state-of-the-art methods based on classical connectomes, while dramatically improving parsimony and maintaining interpretability. Our PPA package is publicly available on GitHub, and can be implemented routinely for diffusion image data.


Asunto(s)
Conectoma , Sustancia Blanca , Humanos , Conectoma/métodos , Encéfalo/diagnóstico por imagen
4.
J R Stat Soc Ser C Appl Stat ; 72(2): 254-270, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-37197290

RESUMEN

We aim to infer bioactivity of each chemical by assay endpoint combination, addressing sparsity of toxicology data. We propose a Bayesian hierarchical framework which borrows information across different chemicals and assay endpoints, facilitates out-of-sample prediction of activity for chemicals not yet assayed, quantifies uncertainty of predicted activity, and adjusts for multiplicity in hypothesis testing. Furthermore, this paper makes a novel attempt in toxicology to simultaneously model heteroscedastic errors and a nonparametric mean function, leading to a broader definition of activity whose need has been suggested by toxicologists. Real application identifies chemicals most likely active for neurodevelopmental disorders and obesity.

5.
PLoS One ; 18(4): e0284904, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37099536

RESUMEN

Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual information (MI), a statistical summary of data interdependence with appealing properties that make it a suitable alternative or addition to correlation for identifying relationships in data. MI: (i) captures all types of dependence, both linear and nonlinear, (ii) is zero only when random variables are independent, (iii) serves as a measure of relationship strength (similar to but more general than R2), and (iv) is interpreted the same way for numerical and categorical data. Unfortunately, MI typically receives little to no attention in introductory statistics courses and is more difficult than correlation to estimate from data. In this article, we motivate the use of MI in the analyses of epidemiologic data, while providing a general introduction to estimation and interpretation. We illustrate its utility through a retrospective study relating intraoperative heart rate (HR) and mean arterial pressure (MAP). We: (i) show postoperative mortality is associated with decreased MI between HR and MAP and (ii) improve existing postoperative mortality risk assessment by including MI and additional hemodynamic statistics.


Asunto(s)
Hemodinámica , Humanos , Estudios Retrospectivos , Frecuencia Cardíaca
6.
Cereb Cortex ; 33(9): 5307-5322, 2023 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-36320163

RESUMEN

The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of 72 people across the age span, enriched for the APOE4 genotype to reveal vulnerable networks associated with a composite AD risk factor including age, genotype, and sex. Sparse canonical correlation analysis (CCA) revealed a high weight associated with genotype, and subgraphs involving the cuneus, temporal, cingulate cortices, and cerebellum. Adding cognitive metrics to the risk factor revealed the highest cumulative degree of connectivity for the pericalcarine cortex, insula, banks of the superior sulcus, and the cerebellum. To enable scaling up our approach, we extended tensor network principal component analysis, introducing CCA components. We developed sparse regression predictive models with errors of 17% for genotype, 24% for family risk factor for AD, and 5 years for age. Age prediction in groups including cognitively impaired subjects revealed regions not found using only normal subjects, i.e. middle and transverse temporal, paracentral and superior banks of temporal sulcus, as well as the amygdala and parahippocampal gyrus. These modeling approaches represent stepping stones towards single subject prediction.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/patología , Imagen por Resonancia Magnética , Encéfalo/patología , Genotipo , Envejecimiento
7.
J Am Stat Assoc ; 118(544): 2521-2532, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38501061

RESUMEN

We aim at modeling the appearance of distinct tags in a sequence of labeled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarized via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects. We propose a novel Bayesian method for species sampling modeling by directly specifying the probability of a new discovery, therefore, allowing for flexible specifications. The asymptotic behavior and finite sample properties of such an approach are extensively studied. Interestingly, our enlarged class of sequential processes includes highly tractable special cases. We present a subclass of models characterized by appealing theoretical and computational properties, including one that shares the same discovery probability with the Dirichlet process. Moreover, due to strong connections with logistic regression models, the latter subclass can naturally account for covariates. We finally test our proposal on both synthetic and real data, with special emphasis on a large fungal biodiversity study in Finland. Supplementary materials for this article are available online.

8.
Ann Appl Stat ; 16(3): 1380-1399, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36465815

RESUMEN

We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity and long range dependence. This rules out Poisson process based models where the rate function itself is not long range dependent. The proposed class of FRActional Probit (FRAP) models is based on thresholding, a latent process. This latent process is modeled by a smooth Gaussian process and a fractional Brownian motion by assuming an additive structure. We develop a Bayesian approach to inference using Markov chain Monte Carlo and show good performance in simulation studies. Applying the methods to the Amazon bird vocalization data, we find substantial evidence for self-similarity and non-Markovian/Poisson dynamics. To accommodate the bird vocalization data in which there are many different species of birds exhibiting their own vocalization dynamics, a hierarchical expansion of FRAP is provided in the Supplementary Material.

10.
Ann Appl Stat ; 16(4): 2369-2395, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36425314

RESUMEN

Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of routinely-used community detection algorithms, and requires extensions of model-based solutions to realistically characterize the node partition process, incorporate information from node attributes, and provide improved strategies for estimation and uncertainty quantification. To cover these gaps, we develop a new class of extended stochastic block models (esbm) that infer groups of nodes having common connectivity patterns via Gibbs-type priors on the partition process. This choice encompasses many realistic priors for criminal networks, covering solutions with fixed, random and infinite number of possible groups, and facilitates the inclusion of node attributes in a principled manner. Among the new alternatives in our class, we focus on the Gnedin process as a realistic prior that allows the number of groups to be finite, random and subject to a reinforcement process coherent with criminal networks. A collapsed Gibbs sampler is proposed for the whole esbm class, and refined strategies for estimation, prediction, uncertainty quantification and model selection are outlined. The esbm performance is illustrated in realistic simulations and in an application to an Italian mafia network, where we unveil key complex block structures, mostly hidden from state-of-the-art alternatives.

11.
Artículo en Inglés | MEDLINE | ID: mdl-35891979

RESUMEN

High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably less attention has been devoted to methods for large scale data that allow the description of complex relationships between several outcomes recorded at high resolutions by different sensors. Our Bayesian multivariate regression models based on spatial multivariate trees (SpamTrees) achieve scalability via conditional independence assumptions on latent random effects following a treed directed acyclic graph. Information-theoretic arguments and considerations on computational efficiency guide the construction of the tree and the related efficient sampling algorithms in imbalanced multivariate settings. In addition to simulated data examples, we illustrate SpamTrees using a large climate data set which combines satellite data with land-based station data. Software and source code are available on CRAN at https://CRAN.R-project.org/package=spamtree.

12.
Front Neurosci ; 16: 848654, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35784847

RESUMEN

Spatial navigation and orientation are emerging as promising markers for altered cognition in prodromal Alzheimer's disease, and even in cognitively normal individuals at risk for Alzheimer's disease. The different APOE gene alleles confer various degrees of risk. The APOE2 allele is considered protective, APOE3 is seen as control, while APOE4 carriage is the major known genetic risk for Alzheimer's disease. We have used mouse models carrying the three humanized APOE alleles and tested them in a spatial memory task in the Morris water maze. We introduce a new metric, the absolute winding number, to characterize the spatial search strategy, through the shape of the swim path. We show that this metric is robust to noise, and works for small group samples. Moreover, the absolute winding number better differentiated APOE3 carriers, through their straighter swim paths relative to both APOE2 and APOE4 genotypes. Finally, this novel metric supported increased vulnerability in APOE4 females. We hypothesized differences in spatial memory and navigation strategies are linked to differences in brain networks, and showed that different genotypes have different reliance on the hippocampal and caudate putamen circuits, pointing to a role for white matter connections. Moreover, differences were most pronounced in females. This departure from a hippocampal centric to a brain network approach may open avenues for identifying regions linked to increased risk for Alzheimer's disease, before overt disease manifestation. Further exploration of novel biomarkers based on spatial navigation strategies may enlarge the windows of opportunity for interventions. The proposed framework will be significant in dissecting vulnerable circuits associated with cognitive changes in prodromal Alzheimer's disease.

13.
Ann Appl Stat ; 16(2): 765-790, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35813556

RESUMEN

Psychiatric studies of suicide provide fundamental insights on the evolution of severe psychopathologies, and contribute to the development of early treatment interventions. Our focus is on modelling different traits of psychosis and their interconnections, focusing on a case study on suicide attempt survivors. Such aspects are recorded via multivariate categorical data, involving a large numbers of items for multiple subjects. Current methods for multivariate categorical data-such as penalized log-linear models and latent structure analysis-are either limited to low-dimensional settings or include parameters with difficult interpretation. Motivated by this application, this article proposes a new class of approaches, which we refer to as Mixture of Log Linear models (mills). Combining latent class analysis and log-linear models, mills defines a novel Bayesian approach to model complex multivariate categorical data with flexibility and interpretability, providing interesting insights on the relationship between psychotic diseases and psychological aspects in suicide attempt survivors.

14.
Ann Appl Stat ; 16(1): 391-413, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35757598

RESUMEN

Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say 4). Mixed membership models quantify this phenomenon, but they typically focus on goodness-of-fit more than on interpretable inference. To achieve a good numerical fit, these models may in fact require many extreme profiles, making the results difficult to interpret. We introduce a new class of multivariate mixed membership models that, when variables can be partitioned into subject-matter based domains, can provide a good fit to the data using fewer profiles than standard formulations. The proposed model explicitly accounts for the blocks of variables corresponding to the distinct domains along with a cross-domain correlation structure, which provides new information about shared membership of individuals in a complex classification scheme. We specify a multivariate logistic normal distribution for the membership vectors, which allows easy introduction of auxiliary information leveraging a latent multivariate logistic regression. A Bayesian approach to inference, relying on Pólya gamma data augmentation, facilitates efficient posterior computation via Markov Chain Monte Carlo. We apply this methodology to a spatially explicit study of malaria risk over time on the Brazilian Amazon frontier.

15.
Bioinformatics ; 38(16): 4011-4018, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35762974

RESUMEN

MOTIVATION: It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank. RESULTS: ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications. AVAILABILITY AND IMPLEMENTATION: ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Neuroimagen , Humanos , Encéfalo/diagnóstico por imagen , Programas Informáticos
16.
Artículo en Inglés | MEDLINE | ID: mdl-35162394

RESUMEN

Humans are exposed to a diverse mixture of chemical and non-chemical exposures across their lifetimes. Well-designed epidemiology studies as well as sophisticated exposure science and related technologies enable the investigation of the health impacts of mixtures. While existing statistical methods can address the most basic questions related to the association between environmental mixtures and health endpoints, there were gaps in our ability to learn from mixtures data in several common epidemiologic scenarios, including high correlation among health and exposure measures in space and/or time, the presence of missing observations, the violation of important modeling assumptions, and the presence of computational challenges incurred by current implementations. To address these and other challenges, NIEHS initiated the Powering Research through Innovative methods for Mixtures in Epidemiology (PRIME) program, to support work on the development and expansion of statistical methods for mixtures. Six independent projects supported by PRIME have been highly productive but their methods have not yet been described collectively in a way that would inform application. We review 37 new methods from PRIME projects and summarize the work across previously published research questions, to inform methods selection and increase awareness of these new methods. We highlight important statistical advancements considering data science strategies, exposure-response estimation, timing of exposures, epidemiological methods, the incorporation of toxicity/chemical information, spatiotemporal data, risk assessment, and model performance, efficiency, and interpretation. Importantly, we link to software to encourage application and testing on other datasets. This review can enable more informed analyses of environmental mixtures. We stress training for early career scientists as well as innovation in statistical methodology as an ongoing need. Ultimately, we direct efforts to the common goal of reducing harmful exposures to improve public health.


Asunto(s)
National Institute of Environmental Health Sciences (U.S.) , Proyectos de Investigación , Exposición a Riesgos Ambientales/análisis , Métodos Epidemiológicos , Estudios Epidemiológicos , Humanos , Medición de Riesgo , Estados Unidos
17.
J Am Stat Assoc ; 116(535): 1521-1532, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34898761

RESUMEN

This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. We propose a latent factor joint model, which includes shared factors in both the predictor and response components while assuming conditional independence. By including a quadratic regression in the latent variables in the response component, we induce flexible dimension reduction in characterizing main effects and interactions. We propose a Bayesian approach to inference under this Factor analysis for INteractions (FIN) framework. Through appropriate modifications of the factor modeling structure, FIN can accommodate higher order interactions. We evaluate the performance using a simulation study and data from the National Health and Nutrition Examination Survey (NHANES). Code is available on GitHub.

18.
Neuroimage ; 245: 118750, 2021 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-34823023

RESUMEN

There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer their relationships to human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to study the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference, and computing efficiency. We found that the structural connectome has a stronger association with a wide range of human cognitive traits than was apparent using previous approaches.


Asunto(s)
Encéfalo/crecimiento & desarrollo , Encéfalo/fisiología , Cognición/fisiología , Conectoma/métodos , Imagen por Resonancia Magnética , Adolescente , Algoritmos , Niño , Simulación por Computador , Conjuntos de Datos como Asunto , Femenino , Humanos , Imagenología Tridimensional , Masculino , Modelos Neurológicos , Dinámicas no Lineales , Fenotipo , Lectura , Adulto Joven
19.
Artículo en Inglés | MEDLINE | ID: mdl-35754924

RESUMEN

Statistical methods relating tensor predictors to scalar outcomes in a regression model generally vectorize the tensor predictor and estimate the coefficients of its entries employing some form of regularization, use summaries of the tensor covariate, or use a low dimensional approximation of the coefficient tensor. However, low rank approximations of the coefficient tensor can suffer if the true rank is not small. We propose a tensor regression framework which assumes a soft version of the parallel factors (PARAFAC) approximation. In contrast to classic PARAFAC where each entry of the coefficient tensor is the sum of products of row-specific contributions across the tensor modes, the soft tensor regression (Softer) framework allows the row-specific contributions to vary around an overall mean. We follow a Bayesian approach to inference, and show that softening the PARAFAC increases model flexibility, leads to improved estimation of coefficient tensors, more accurate identification of important predictor entries, and more precise predictions, even for a low approximation rank. From a theoretical perspective, we show that employing Softer leads to a weakly consistent posterior distribution of the coefficient tensor, irrespective of the true or approximation tensor rank, a result that is not true when employing the classic PARAFAC for tensor regression. In the context of our motivating application, we adapt Softer to symmetric and semi-symmetric tensor predictors and analyze the relationship between brain network characteristics and human traits.

20.
Artículo en Inglés | MEDLINE | ID: mdl-35754922

RESUMEN

Many modern data sets require inference methods that can estimate the shared and individual-specific components of variability in collections of matrices that change over time. Promising methods have been developed to analyze these types of data in static cases, but only a few approaches are available for dynamic settings. To address this gap, we consider novel models and inference methods for pairs of matrices in which the columns correspond to multivariate observations at different time points. In order to characterize common and individual features, we propose a Bayesian dynamic factor modeling framework called Time Aligned Common and Individual Factor Analysis (TACIFA) that includes uncertainty in time alignment through an unknown warping function. We provide theoretical support for the proposed model, showing identifiability and posterior concentration. The structure enables efficient computation through a Hamiltonian Monte Carlo (HMC) algorithm. We show excellent performance in simulations, and illustrate the method through application to a social mimicry experiment.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...